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Google Search Education 


Home Lesson Plans Live Trainings 


Help your students become | CUIU 
better searchers : کت‎ 


Web search сап be a remarkable tool for students, and a bit of 
instruction in how to search for academic sources will help your 
students become critical thinkers and independent learners. 


With the materials on this site, you can help your students become 
skilled searchers- whether they're just starting out with search, or 
ready for more advanced training. 
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Lesson Plans & Activities Power Searching A Google a Day Challenges Live Trainings 


Download lesson plans to develop Improve your search skills and learn Put your students’ search skills to the Join us for live search trainings or 
your students' search literacy skills. advanced tips with online lessons and test with these trivia challenges. watch past trainings from search 


activities. experts here at Google. 
Browse lesson plans Browse challenges = ۰ 


Start now Start training 


Introduction 


Without search engines the web wouldn't scale 
e No incentive in creating content unless it can be found. 
e Taxonomies, bookmarks can't keep up 
e Orcan they? (del.icio.us) 
e [he web is both a technology artifact and a social 
environment 
e “The Web has become the ‘new normal’ in the America 
way of life; those who don't go online constitute an 
every-shrinking minority" [Pew Foundation report, 


anuary 2005 


Introduction 


Without search engines the web wouldn't scale 
e Search engines make aggregation of interest possible: 
e Create incentives for very niche players 
e Economical - specialized stores, providers, etc. 
e Social - narrow interests, specialized communities 
e The acceptance of search interaction makes "unlimited 


selection" stores possible 


e Amazon, Netflix, etc. 


Introduction 


Without search engines the web wouldn't scale 
e Search turned out to be the best mechanism for advertising 
on the web,. 


e Growing very fast (entire US advertising industry is $250 


billion though) 
e $15 billion plus industry in 2009 
e 836 billion in 2012 


Overview 


e Introduction 
e Classic Information Retrieval 
e Web IR 
e Sponsored Search 
e Web Search Basics 
e Size of the Web 
e Web Users 


e Spam 


Classic Information Retrieval 


Classic IR assumptions 


e Corpus: Fixed document collection 


e Goal: Retrieve information content relevant to 


information need 


Classic Information Retrieval 


Classic IR Goal 


ө Classic “Relevance” 


e For each query, Q, and stored document, D, in a corpus 
there exists a relevance score: R(Q,D) 

e R(Q,D) is averaged over users, ,لا‎ and contexts, С 

e Maximize R(Q,D) instead of R(Q,D,U,C) 
e Context is ignored 


e Individuals are ignored 


e Corpus is static 


Overview 


e Introduction 
e Classic Information Retrieval 
e Web IR 
e Sponsored Search 
e Web Search Basics 
e Size of the Web 
e Web Users 


e Spam 


Web Information Retrieval 


Web IR: Differences from traditional IR 
e On the web, search and ads are intricately connected 
e The web is huge 
e The web is a rapidly changing collection. 
e There is spam on the web 
e Adversarial IR 
e Huge difference from traditional IR 


e One interface for hugely divergent needs 


e Queries, Maps, Stocks, Weather, Calculations 


Web Information Retrieval 


History 
e Early keyword-based engines 
e (1995-1997) Altavista, Excite, Infoseek, Inktomi 
e Paid placement ranking 


e Goto.com -» Overture.com -> Yahoo! 


e Results based on auction for keyword placement 


Wilmingtor/'s information and real estate guide, This کا‎ your or 
anytheng to do with Wilmengton. 
ww, buddy blake. com (Cost te д 


Wilmington! 5 pur real estate company. 
www.chreacoast. com? (Cort to advertiser: 10,37) 


Everything you need to кемин ید‎ mm ot selling a home c 
on my Web site! 
WWW, WAC PE! (Cort to advertiser 10.22) 


Web Information Retrieval 


History 
e (1998+) Link-based ranking pioneered by Google 
e Links added the idea of “authoritativeness” to 
“relevance” 
e Blew away all early engines save Inktomi 
e Great user experience looking for a business model 


Meanwhile Goto/Overture’s annual revenues were‏ ٭ 


nearing $1 billion 


Web Information Retrieval 


History 
e Result 
٭‎ Google: 
e Added paid placement ads on the side 
e Differentiated from search results 
e Yahoo! built a similar architecture 


e Buys Overture for paid placement 


e Buys Inktomi for search 


History 
e 2004 


e Microsoft begins in house development of a search 


About the homepage image Find a tanning salon near you Casey Johnson heiress - Kyla Weber - Las Vegas shooting - Rosie O'Donnell... 


engine called Live 

e May 28, 2009 
e Microsoft rebrands Live Search to Bing! 
e Search Engine wars intensify 


e New innovations appears at every turn 


e Technology becomes much more closely guarded 


Web Information Retrieval 


History 
e Internationally 
e Chinese search engine Baidu “owns” Chinese search 


e Launched around 2000, specializes in Chinese content 


Bai NSE 


iH AR АЕ AW ВЖ EH RH mE 


Laet | 


BE XÆ һао1231%4%>> 


O LLERS, питан 


Web Information Retrieval 


Today (1/7/2014) 
1. google.com 11. sina.com.cn 21. wordpress.com 
2. facebook.com 12. twitter.com 25. bing.com 
5. youtube.com 13. hao125.com 40. pinterest.com 
4. yahoo.com 14. 165.com 52. msn.com 
5. baidu.com 15. blogspot.com 34. tumblr.com 
б. wikipedia.org 16. google.co.in 58. instagram.com 
/. qq.com 17. linkedin.com 59. paypal.com 
8. taobao.com 18. weibo.com 41. «porn» 
9. amazon.com 19. tmall.com 45. apple.com 


10. live.com 20. ebay.com 50. «porn» 


Web Information Retrieval 


Before (1/7/2010) 
1. google.com 13. twitter.com 
2. facebook.com 15. google.cn 
5. youtube.com 22. bing.com 
4. yahoo.com 29. google.co.jp 
5. live.com 56. ask.com 
6. wikipedia.org 64. cnn.com 
/. blogger.com 
8. baidu.com 
9. msn.com 


10. yahoo.co.jp 


Overview 


e Introduction 
e Classic Information Retrieval 
e Web IR 
e Sponsored Search 
e Web Search Basics 
e Size of the Web 
e Web Users 


e Spam 


Sponsored Search 


search engine optimization Search | LA 


Preferences 


| Web Blogs News Personalized Results 15 100,000 for search engine optim 


| Search Engine Optimize sored Links 


SEOP.com Guaranteed Top Ranking w Warranty. Free Site Analysis! 877-231-158 


Guaranteed Page 1 Rankin Use Network Solutions online tools 
www.berankednumberi.com Guaranteed Page 1 Rankings $49.95 No Charge Until You are on Page 1 to drive business to your web site. 
marketing.networksolutions.com 


Search engine optimization (SEO): is the process of improving the volume and quality of Search 0 timization Firm 
traffic to a web site from search engi nes via "natural" ("огдапіс" ог. Looking Tor top nange се тан 
en.wikipedia.org/wiki/Search engine optimization - 87k - Cached hed - Similar pages - Note thj results. Receive a free analysis. 


www.customermagnetism.com 


Algorithmic 


SEO Company 


Search Engine Optimization, Google Optimization - SEO Chat. = : -— ' 
www.seochat.com/ - 111k - Cached - Similar pages - Note this Re S U Its Search Engine Optimization services 


since 1998 with proven results. 
www.iClimber.com 


gine aiton | 
Offers کے‎ епдїпе optimization (SEO) marketing services & placement since 1998. imivati 
Get Optimization Help Now 
Submit your website URL to 40 major search engines for FREE! Top SEO Firms Want Your Business. 


Fast, Free Competitive Quotes! 


www.submitexpress.com/ - 42k - Cached - Similar pages - Mote this 


: : млм. Т. .com/SEO 
News results for search engine optimization pesos 
a=, CIBER Selected as E-Commerce Vendor by Elite Island Resorts - Jan 3, 2008 Check your SEO for Free 
Their search engine marketing program will help us lower acquisition costs ... CIBER's PPC vs Natural search Keyword ranks 
advanced search engine marketing services will help Elite direct more ... costs & robot stats: 15 days free 
FOX News - 10 related articles » www.ClickTracks.com/15 Days Free 
bruceclay.com - Search Engine Optimization - SEO Training, Tools ... Search Enaine Marketin 
Search Engine Optimization, ranking, placement, and submission tutorial. Free Boost Online Traffic and Sales! 
step-by-step SEO tools and advice. SEO training and services offered. ... Free Site Optimization Analysis. 
www.bruceclay.com/web rank.htm - 87k - Cached - Similar pages - Mote this www.corporatesearchoptimization.com 
Inteliture™ Search Engine Optimization, Internet Marketing, and ... Free Website Visitors 
Inteliture™ a professional search engine optimization and internet marketing company. Free Visitors Plus Top 10 Positions 
Offers internet marketing solutions, search engine optimization ... In 8 Hours! FREE Trial Offer. 


www.inteliture.com/ - 12k - Cached - Similar pages - Note this www.EngineSeeker.com 


Sponsored Search 


Ads vs. Search Results 


Sponsored Links 


А А Search engi pti 
° Google maintains that ads (based on Use Network Solutions online too 


to drive business to your web site. 
marketing.networksolutions.com 


vendors bidding for search queries) do Search Optimization Firm 
Looking for top rankings’? Get real 
results. Receive a free analysis. 


not affect vendors ranking in search www.customermagnelism.com 
/ SEO Company 


Search Engine Optimization services 
since 1998 with proven results. 
www.iClimber.com 


results 


Search engine optimization - Wikipedia, the free encyclopedia 

Search engine optimization (SEO) is the process of improving the volume and quality of 
traffic to a web site from search engines via "natural" ("organic" or ... 
en.wikipedia.org/wiki/Search engine optimization - 87k - Cached - Similar pages - Note this 


Search Engine Optimization jle Optimization - SEO Chat 


Search Engine Optimization, Google Optimization - SEO Chat. 
www.seochat.com/ - 111k - Cached - Similar pages - Note this 


Search Engine Optimization (SEO) Marketing Firm & Placement Compan 
Offers search engine optimization (SEO) marketing services & placement since 1998. 
Submit your website URL to 40 major search engines for FREE! 

www.submitexpress.com/ - 42k - Cached - Similar pages - Note this 


News results for search engine optimization 


= CIBER Selected as E-Commerce Vendor by Elite Island Resorts - Jan 3, 2008 

ww Their search engine marketing program will help us lower acquisition costs ... CIBER's 
advanced search engine marketing services will help Elite direct more ... 
FOX News - 10 related articles » 
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Sponsored Search 


Ranking of ads 


e Goto model: 
٭‎ Rank according to how much advertiser pays 
e Current model: 
e Balance auction price and relevance 
e |rrelevant ads (few click-throughs) 
e Decrease opportunities for relevant ads 


e Harm the user experience 


e |dea: Well-targeted advertising is good for everyone 


Sponsored Search 


Paying for advertisements - terms 
e CPM 
e “Cost Per Mil” 
e Pay for 1000 eyeballs 
e Important for branding campaigns 
e CPC 
e “Cost per Click” 


e Pay for clicking on ads 


e Important for sales campaigns 


Overview 


e Introduction 
e Classic Information Retrieval 
e Web IR 
e Sponsored Search 
e Web Search Basics 
e Size of the Web 
e Web Users 


e Spam 


The Web @ 


Web Search Basics 


The Web Corpus 


No design/coordination 

Distributed content creation, linking 
“Democratization of publishing” 

Content includes truth, lies, contradictions, etc. 
Unstructured Data (text, html) 

Semi-Structured (XML, annotated photos) 


Structured (Databases) 


scale is much larger than previous text corpora 


Web Search Basics 


The Web Corpus 


e Growth - slowing from "doubling every few 


months", but still expanding 


The Web 


Web Search Basics 


Dynamic Content 
e Content can by dynamically generated 
e There is no static html version 


e Flight status information, evite responses 


e Assembled on request ("?" in URL is a clue) 
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Databases 


The User 
flickr:crankyT 


r:cran 


Flight AA715 
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Web Search Basics 


Dynamic Content 
٭‎ Most (truly) dynamic content is ignored by search engines 


Too much to index‏ ٭ 
e Static information is more important for search‏ 


e Spider Traps look dynamic 


e Actually a lot of "static" content is assembled on the fly also 


e ASP PHP, JSP, ads, etc... 


Web Search Basics 


The Web as a graph 


e Web pages are nodes 


e Hyperlinks are directed edges 
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Web Search Basics 


Characteristics of the web 
e Significant Duplication 


e 5096-4096 is some studies [Brod97, Shiv99] 
٭‎ www.copyscape.com 

e High linkage 
e more than 8 links per page on average 


e Spam 


e Billions of pages of it. 


Web Search Basics 


The User 
flickr:crankyT 


Sponsored Links 


Search engine optimizer 
Use Network Solutions online tools 
to drive business to your web site. 
marketing.networksolutions.com 


Search Optimization Firm 
Looking for top rankings? Get real 
results. Receive a free analysis. 
www.customermagnetism.com 


SEO Company 
Search Engine Optimization services 
since 1998 with proven results. 

www.iClimber.com 


Search Results 


Search engine optimization - Wikipedia, the free encyclop 
Search engine optimization (SEO) is the of 1 and quality of 
traffic to a web site from search engines via "natural" ("organic" ог... 
en.wikipedia.org/wik/Search engine optimization - 87k - Cached - Similar pages - Note this 


00 le imi n- hal 
Search Engine Optimization, Google Optimization - SEO Chat. 
www.seochat.com/ - 111k - Cached - Similar pages - Note this 
Search Engine Optimization O) Marketing Firm & Placeme ompa 
Offers search engine optimization (SEO) marketing services & placement since 1998. 
Submit your website URL to 40 major search engines for FREE! 
www.submitexpress.com/ - 42k - Cached - Similar pages - Note this 
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News results for search engine о| 
CIBER Selected as E- ۷ Elite ls - Jan 3, 2008 

Their search engine marketing program will help us lower acquisition costs ... CIBER's 
advanced search engine marketing services will help Elite direct more ... 

FOX News - 10 related articles » 


The Web 


Indices Ad Indices 


Size of the Web 


How big is the web? 
e What is measured? 
e Number of hosts 
e Number of "static" html pages 
e Number of hosts - netcraft survey 
e http://news.netcraft.com/archives/web. server survey.html 
e Monthly report on hosts and servers 


e Number of pages 


e Lots of estimates which warrant further discussion 
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How big is the web? 


e Netcraft Web Server Survey 
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Size of the Web 


How big is the web? 


e Netcraft Web Server Survey 


Total number of websites (linear scale) 
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Nerd Trivia 


Web server developers: Market share of all sites 
ll Apache 
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Size of the Web 


Rate of change 
e [Cho00] 720k pages from 270 popular sites sample daily for 5 
months in 1999 
e 40% changed weekly, 23% daily 
e [Fett02] Massive study: 151M pages checked over a few 
months 


e Significant changes 7% weekly 


e Any change 25% weekly 


Size of the Web 


Rate of change 


e [Ntul04] 154 large sites recrawled from scratch weekly 
e 8% had new pages ever week 
e 8% die 


ө 5% new content 


e 25% new links per week 


Rate of change 
e Fetterly et al. study in 2002 
e 150 million pages over 11 weekly crawls 


e Bucketed into 85 groups ae to amount of s 
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Size of the Web 


Web Evolution 
e The nature of the web is change 
٭‎ Not much work on studying web evolution 
e Exception is Fetterly et. al, 2003 


e Some effort has been made to extrapolate from small 


samples using fractal models [Dill et. al. 2001] 


Nature of the Web 


The very nature of the web is changing as well 
e Transforming from a source of information 
e to what? 
e acommunication platform? 
e asource of computation? 
e an application-space? 
e a mirror-world? 


e an augmentation of reality? 


e acognitive orthotic? 


